The Automatic Resolution of Prepositional Phrase - Attachment Ambiguities in German

نویسنده

  • Martin Volk
چکیده

Any computer system for natural language processing has to struggle with the problem of ambiguities. If the system is meant to extract precise information from a text, these ambiguities must be resolved. One of the most frequent ambiguities arises from the attachment of prepositional phrases (PPs). A PP that follows a noun can be attached to the noun or to the verb. In this book we propose a method to resolve such ambiguties in German sentences based on cooccurrence values derived from a shallow parsed corpus. Corpus processing is therefore an important preliminary step. We introduce the modules for proper name recognition and classification, Part-of-Speech tagging, lemmatization, phrase chunking, and clause boundary detection. We processed a corpus of more than 5 million words from the Computer-Zeitung, a weekly computer science newspaper. All information compiled through corpus processing is annotated to the corpus. In addition to the training corpus, we prepared a 3000 sentence test corpus with manually annotated syntax trees. From this treebank we extracted over 4000 test cases with ambiguously positioned PPs for the evaluation of the disambiguation method. We also extracted test cases from the NEGRA treebank in order to check the domain dependency of the method. The disambiguation method is based on the idea that a frequent cooccurrence of two words in a corpus indicates binding strength. In particular, we measure the cooccurrence strength between nouns (N) and prepositions (P) and on the other hand between verbs (V) and prepositions. The competing cooccurrence values of N+P versus V+P are compared to decide whether to attach a prepositional phrase (PP) to the noun or to the verb. A variable word order language like German poses special problems for determining the cooccurrence value between verb and preposition since the verb may occur at different positions in a sentence. We tackle this problem with the help of a clause boundary detector to delimit the verb’s access range. Still, the cooccurrence values for V+P are much stronger than for N+P. We need to counterbalance this inequality with a noun factor which is computed from the general tendency of all prepositions to attach to verbs rather than to nouns. It is shown that this noun factor leads to the optimal attachment accuracy. The method for determining the cooccurrence values is gradually refined by distinguishing sure and possible attachments, different verb readings, idiomatic and non-idiomatic usage, deverbal versus regular nouns, as well as the head noun from the prepositional phrase. In parallel we increase the coverage of the method by using various clustering techniques: lemmatization, core of compounds, proper name classes and the GermaNet thesaurus. In order to evaluate the method we used the two test sets. We also varied the training corpus to determine its influence on the cooccurrence values. As the ultimate corpus, we tried cooccurrence frequencies from the WWW. Finally, we compared our method to another unsupervised method and to two supervised methods for PP attachment disambiguation. We show that intertwining our cooccurrencebased method with the supervised Back-off model leads to the best results: 81% correct attachments for the Computer-Zeitung test set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

We present an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images. The motivation for this work comes from the fact that some ambiguities in language simply cannot be resolved without simultaneously reasoning about an associated image. If we consider the sentence “I shot an elephant in my pajamas”, looking at the language ...

متن کامل

Using the WWW to resolve PP attachment ambiguities

We have developed a method to resolve ambiguities in prepositional phrase (PP) attachment in German. We measure on the one hand the cooccurrence strength between nouns (N) and prepositions (P) and on the other hand between verbs (V) and prepositions. The competing values of N+P versus V+P are used to decide whether to attach a prepositional phrase to the noun or to the verb. We calculate the co...

متن کامل

Scaling up. Using the WWW to Resolve PP Attachment Ambiguities

We have developed a method to resolve ambiguities in prepositional phrase (PP) attachment in German. We measure on the one hand the cooccurrence strength between nouns (N) and prepositions (P) and on the other hand between verbs (V) and prepositions. The competing values of N+P versus V+P are used to decide whether to attach a prepositional phrase to the noun or to the verb. We calculate the co...

متن کامل

Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation

In this paper, a hybrid disambiguation method for the prepositional phrase (PP) attachment and interpretation problem is presented. 1 The data needed, semantic PP interpretation rules and an annotated corpus, is described first. Then the three major steps of the disambiguation method are: explained. Cross-validated evaluation results', for German (88.6-94.4% correct for binary attachment ambigu...

متن کامل

Prepositional Phrase Attachment Problem Revisited: how Verbnet can Help

Resolving attachment ambiguities is a pervasive problem in syntactic analysis. We propose and investigate an approach to resolving prepositional phrase attachment that centers around the ways of incorporating semantic knowledge derived from the lexico-semantic ontologies such as VERBNET and WORDNET.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002